Word Embeddings in Pytorch

a few quick notes about how to use embeddings in Pytorch and in deep learning programming in general.

we also need to define an index for each word when using embeddings.

単語はインデックス（整数）で表す

辞書 word_to_ix を使う

embeddings are stored as a ∣V∣×D matrix, where D is the dimensionality of the embeddings, such that the word assigned index iii has its embedding stored in the iii’th row of the matrix

torch.nn.Embedding

two arguments: the vocabulary size, and the dimensionality of the embeddings.

上で述べた|V|とD

（from_pretrainedから読み込める！）

To index into this table, you must use torch.LongTensor (since the indices are integers, not floats).

code:python

>> # torch.nn.Embedding のドキュメントの例

>> import torch

>> import torch.nn as nn

>> torch.manual_seed(1)

>> embedding = nn.Embedding(10, 3)

>> inputs = torch.LongTensor(1,2,4,5], [4,3,2,9) # size (2, 4)

>> outputs = embedding(inputs)

>> outputs.size() # inputsの長さが2なので、Sizeのindex 0は2

torch.Size(2, 4, 3)

>> outputs0.size() # 4つの単語（インデックス）について3次元のembeddingで表現しているから (4, 3)

torch.Size(4, 3)

>> outputs0

tensor([-1.6095, -0.1002, -0.6092, # インデックス1の単語に対応

-0.9798, -1.6091, -0.7121, # インデックス2の単語

-0.2223, 1.6871, -0.3206, # インデックス4の単語

-0.2993, 1.8793, -0.0721], grad_fn=<SelectBackward0>)

>> outputs1

tensor([-0.2223, 1.6871, -0.3206, # インデックス4の単語（outputs0に出てくるベクトルと同じ）

0.3037, -0.7773, -0.2515, # インデックス3の単語

-0.9798, -1.6091, -0.7121, # インデックス2の単語（outputs0に出てくるベクトルと同じ）

-0.0288, 2.3571, -1.0373], grad_fn=<SelectBackward0>)

コード例は torch.nn.Embeddingsのpadding_idx引数に続く